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A set of one- line text-book-style mathematical expressions is de- 
fined by a context free grammar. This grammar generates strings which 
describe the expressions in terms of mathematical symbols and some simple 
positional operators, such as vertical concatenation. The grammar rules 
are processed to abstract information used to drive the parsing scheme. 
This has been called syntax-controlled as opposed to syntax-directed 
analysis. 

The parsing scheme consists of two operations. First, the X-Y 
plane is searched in such a way that the mathematical characters are 
picked up in a unique order. Then, the resulting character string is 
parsed using a precedence algorithm with certain modifications for 
special cases. The search of the X-Y plane is directed by the particular 
characters encountered. 
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I. Introduction 

No satisfactory method of typing mathematical expressions in a linear 

13 
string has as yet been devised. Chapter II of my thesis shows the 

difficult notation I had to use. A good method for communicating two- 
dimensional expressions to the computer is needed. Klerer has devised 
an algorithm for parsing two-dimensional expressions constructed on a 
slightly modified typewriter, but these expressions are not easy to type. 
One is therefore led to hope that the input of characters through a stylus 

device like the RAND tablet will some day be practical. Existing character 

9 1 

recognition programs are good enough to begin experiments. Anderson 

has just constructed an algorithm for parsing mathematical expressions 
drawn on a RAND tablet. However, while Klerer' s algorithm is quite fast, 
Anderson's takes many seconds to parse an expression of moderate size. 
There are several reasons for this; partly it is a matter of implemen- 
tation. But a very important reason is that Anderson's algorithm is 
very general and has more power than is needed for most of the expressions 
we expect to encounter. For example, consider how Anderson's top down 

syntax directed algorithm would parse: 



n\' T 2+I 

(1) x= £ x 

1=0 

The program is given an ordered list of syntax rules. Those needed for 
this example would be: 

1. S -» E = E 

2. E - T + T 

3. E - T 

4. T - 2 
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E 
5. T - X 



6. T - X 



E 

7. T -» V T 
S 



8. T- 

9. T - I . 

The spatial relationships in the exponent and summation are made 
explicit by giving X and Y coordinates. The parsing program must accept 
any expression which can be generated by starting with S and substituting 
the right side of any rule for its left side. in the expression being formed, 
The parsing program must determine the sequence of rules used to form the 
input expression. Examining the rules in the order given, it would first 
try to apply rule 1. by partitioning the remaining characters on either 
side of an =. Choosing the rightmost = the program forms two possibilities; 



x = and > x 



which must be examples of expressions which can be formed by starting with 

E. Taking the right side first, the + indicates that it might be of the 

form: 





2 A I 

x and 




Proceeding in this manner, the original choice of = is found to be un- 
acceptable. The parser then tries the second = and forms: 





x and ) x 

1=0 



2+1 
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which will prove to be eorreet. When an application of rule 7 is tried, 
the characters will be partitioned into groups depending on whether they 

•o 

are above, below, or completely to the right of the ^ . 

Now let us see how Klerer's algorithm would handle this same example. 
The expression is considered to be a tree branching from left to right. 
The program will pick up the leftmost character, the X. Since this is a 
letter it will know from rule 5 that it might have an exponent, but a 
scan of the appropriate area shows none to be present. It next picks up 
the -, forming a string of the characters found. Here the program knows 
that only rule 1 can apply so it moves right again, adding the £ to 
the string. From rule 7 the program then knows to search in a similar 
manner, first below, then above, and then to the right of the £ • 
It puts marking characters between the characters found in each area. In 
this manner the Klerer program forms the character string: 

x = 7(1 = 0.°' x ' ( 2+I)) 
ithout any false characters being picked up. This string is then par- 
sed by an efficient method for linear strings. The Klerer method is 
superior on this example, but it can fail on: 



w 



b - x 

Ja — d * 
a d ■ 



Anderson will recognize this as 

b -x 

J _£_ dx 

a a 



I think Anderson was led to his approach by a) the desire to handle a 
wide variety of notations, b) the belief that the characters would not be 
constrained in size and position as they are on the typewriter and c) 
the desire for a theoretically attractive syntax directed scheme. It 
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is certainly true that what little documentation Klerer has done makes 

his scheme seem very ad hoc. 

In this memo I present my version of Klerer' s scheme in a systematic 

manner, which shows its power and limitations. This makes more apparent 

where the power of Anderson's scheme is needed. Using as input a list of 
characters and the coordinates of each as shown in Fig. 1, the program 
appears to be about 20 times as fast as Anderson's on the examples shown 
in Fig. 2. Anderson's program will slow down more than linearly as the 
number of characters in the examples is increased, but the Klerer al- 
gorithm will not. On the other hand, Anderson's program contains many 
tests for correctly formed syntactic sub-units. Only after my program 
has been tested on a RAND tablet can a complete comparison be made be- 
tween Anderson's scheme and mine. 
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2 -L. A J. - 

x + i+ x 




ymax 



ycen 



ymin 



= (xmin xcen xmax ymin ycen ymax x) 



xmin xcen xmax 



Fig. 1 
A Hand Coded Example 
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Example 



Seconds to Parse, Compiled CTSS LISP 



x + 3 * I 



.2 



2 j. A j. 
x + i +x 



.3 



10 
x! + £ (x) 1 
1=2 



.3 



I x I + y l v= . 



y=2 



.2 



(x+y) 



dx' 



.4 



fi(x,y) 



.2 



b 

J, xdx 



.2 



x - SIN x= 



.2 



Fig. 2 



Some Examples Parsed 
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II. The Grammar 

The expressions to be parsed can be described by a context free 
grammar. Such a grammar consists of a set of terminal characters , a 
set of non-terminal characters and a set of productions of the form A -* 6, 
where A is a non-terminal character, and 6 is a finite string of terminal 
and non-terminal characters. Starting with a specified non-terminal, S*, 
the mathematical expressions are described by the strings of terminals 
which can be formed by the repeated substitution of the right side of 
any production for its left side, until no non-terminals remain in the 
string. 

Since we are describing two-dimensional mathematical expressions the 
terminal symbols consist of mathematical symbols, parentheses, and the 
following positional operators: 

(C= x y) = CJ7 



L 



(V= x y z) = 



X 



concatenation 
vertical concatenation 



(H= x y z) = 



(E= x y) = 



X 



n 



j 



horseshoe 



exponent 



(S= x y) = f-*-- 



n 



( B = x y) = [ %-- 



1=1 



subscript 



bottom 



The center rectangle of V= is assumed to extend a character width to 
the left of the others; the reason for this will be explained later. C= 
can take an arbitrary number of arguments. 
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If one thinks of each mathematical symbol as being enclosed in a 
rectangle, then the positional operators show how to combine the rec- 
tangles to form larger ones. For example, the string of terminal symbols 

00 

describing . ) x would be: 
1=1 



(C= (V= (C= I - 1) £ °°)(E= X I)). 



We will call this positional operator notation. The positional operators 
defined here will form a left to right tree structure of symbols . 

Before we can present our grammar two more complications must be 
introduced. First, we associate with each production a production in a 
parenthesis grammar which generates the internal LISP representation of 

the mathematical expressions. The use of production pairs is a simple 

3 
example of the scheme proposed by Donovan. For example, a simple 

grammar might be: 

1. S* -* E* S* -* E* 

2 . E* - <PWR H* E*) E* - (E= H* E*) 

3 . E* -» X E* -» X 

4. H* -» X H* - X 

5. H* -» E* H* -» (C= LPAR E* RPAR) 

where PWR stands for exponentiation and LPAR and RPAR stand for left and 

x 
right parenthesis. The generation of the strings to describe (x X ) in 

the two languages would then proceed as follows: 

LISP notation Positional Operator Notation 

S* S* Rule 

E* E* 1 

(PWR H* E*) (E= H* E*) 2 
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LISP notation cont. 



(PWR H* E*) 
(PWR (PWR H* E*) E*) 
(PWR (PWR X E*)E*) 
(PWR (PWR X X)E*) 
(PWR (PWR X X)X) 



; Positional Operator Notation cont 

Rule 
(E= (C= LPAR E* RPAR)E*) 5 

(E= (C= LPAR(E= H* E*)RPAR)E*) 2 
(E= (C= LPAR(E= X E*)RPAR)E*) 4 
(E= (C= LPAR(E= X X)RPAR)E*) 3 
(E= (C= LPAR(E= X X)RPAR)X) 3 



Whenever we substitute for a non-terminal in the positional operator string, 
we make the corresponding substitution in the LISP string. Our grammar 
now generates ordered pairs of strings. The first string represents the 
mathematical expression in our LISP notation and the second represents the 
expression in positional operator notation. Once the parsing program has 
found the series of productions which generate the positional operator no- 
tation for an input expression, the same series can then be used to generate 
its LISP representation. 

The same non-terminal symbol might occur twice on the right side of a 
production. For example we might have: 

E* - ( PLUS E* E*) E* - (C= E* + E*) . 
In. order to distinguish between these two instances when applying the same 
substitutions to both sides, they will be subscripted. This is just a 
notational convenience. The subscripted form of the rule above is: 
E*- (PLUS (E* 1)(E* 2)) E* -• (C= (E* 1) + (E* 2)). 

Finally, the grammar can be further condensed without any change in its 
power if alternative choices and repeated substrings are introduced. 5 The 
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repeated strings are useful since functions such as PLUS and TIMES can 
take an arbitrary number of arguments. The expression x+x+x could be 
generated by the rules: 

S* - (PLUS X S*) S* - (C= X + S*) 

S* -» X S* - X 

But this leads to the unsimplified parsing (PLUS X (PLUS XX)). In- 
stead we will write: 

S* - (PLUS X (REPEAT 1 X)) S* -» (C= X (REPEAT 1 ( + X))) 
The 1 is just used to subscript the REPEAT expression itself for iden- 
tification on both sides. Parentheses around the argument of the 
REPEAT on the right side indicate that it is a string of characters. 
Finally, we introduce OR, so that the rule which may be used to generate 
X + X - X is: 

S* - (PLUS X (REPEAT 1 (OR X (MINUS X))) 

S* -» (C = X (REPEAT 1 (OR (+ X) (-JX)))) 
The OR lets us represent all strings of terms connected by + or - signs 
with just one rule. The OR may have any number of arguments but the 
corresponding argument of a given OR must be taken on both sides. 

The mathematical symbols and operators have been given the names 
shown in Fig. 3. The LISP notation is explained in Fig. 4. A grammar 
for the mathematical expressions used in my thesis is shown in Fig. 5.. 
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r 



SIGMA 



— 


EQSIGN 


+ 


PLUSS 


— 


DASH 


* 


STAR 
QUOTIENT 




1 


BAR 


J 


COMMA 


I 


EXCLAMATION 


( 


LPAR 


) 


RPAR 


I 


INTEGRAL 



Fig. 3 
Symbol Names 
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(PLS ABC)=A + B + C 
(PRD A B C) - A * B * C 
(PWR A B) = A B 

(DRV A B C D E) = d B + D 

dA B dC° 

B 
(ITG D A B C) = j CdD 

A 

C 
(SUM A B C D) = 2, D 
A=B 

(EVL A B C D E) = E | ^ 

(NAM ABC) = C A ^ 

(FAB) s F (A,B) 

(NAM A B (F C D)) = F. _ (C,D) 

(FTL A) = A ;! 

(ABS A) = | A "| 



(Most operators can take any number 
of arguments in the obvious manner.) 

Fig. 4 
LISP Notation 



A. I. Memo 145 



-14- 



Memorandum MAC-M-360 



00020 
00030 
00040 
00060 
00070 
00080 
00130 
00140 
00150 
00160 
00170 
00180 
00190 
00200 
00210 
00220 
00230 
0D240 
00250 
00270 
00280 

002 90 
00300 
00310 
00320 
00340 
00350 
00360 
00370 
00380 

003 90 
00400 
00410 
00420 
00430 
00440 
00470 
00480 
00490 
00500 
00510 
00520 
00530 
00540 
00550 
00560 
00570 
00580 
00590 



E* (PLS(0R F* (PRD - 1 F*) NIL) (REPEAT 1 (OR (PRD -1 F*) F*))) 

(C=(OR(OR F*(PLUSS F*) ) (DASH F*)NIL) (REPEAT l(OR(DASH F*) (PLUSS 



F*)))) 



S* 

M* 



E* E* 

(EQN (E* 1)(E* 2)) (C=(E* 1) EQSIGN(E* 2)) 

(PRD(P* 1) (REPEAT 1 P*) (OR C*(P* 2))) 

(C=(P* 1) (REPEAT 1(STAR P*))STAR(OR C* (P* 2))) 

C* C* 

(ITG V* (E* 1) (E* 2)(E* 3)) 

(C=(H= INTEGRAL (E* 1) (E* 2))(E* 3)D V*) 

(SUM V* (E* 1) (E* 2) H*) 

(C=(V=(C= V* EQSIGN(E* 1))SIGMA(E* 2)) H*) 

(PWR R* E*) (E= R* E*) 

(ABS E*) (C= LBAR E* REAR) 

(V* E* (REPEAT 1 E*))(C= V* LPAR E* (REPEAT 1 (COMMA E*))RPAR) 
V* V* 

I* I* 

(FACTORIAL R*') (C= R* EXCLAMATION) 

(NAM E* (REPEAT 1 E*)V*) (S= V*(C= E* (REPEAT 1 (COMMA E*)))) 

(PWR(NAM(E* 1) (REPEAT 1 E*)V*) (E* 2)) 

(H= V*(C=(E* 1) (REPEAT 1 (COMMA E*)))(E* 2)) 
E* (C= LPAR E* RPAR) 

(PRD (E* 1)(PWR(E* 2) -1))(V=(E* 2 ) QUOTIENT (E* 1)) 

(NAM(E* 1) (REPEAT 1 E*)(V*(E* 2) (REPEAT 2 E*))) 

(C=(S= V*(C=(E* 1) (REPEAT 1 (COMMA E*))))LPAR 

(E* 2) (REPEAT 2 (COMMA E*))RPAR) 

(OR (DRV (REPEAT 1 (V* (OR 1 K*)))V*) 
(DRV(REPJEAT 1(V*(0R 1 K*)))H*)) 

(OR (V=(C= (REPEAT 1 (D (OR V* (E= V* K*))))) 
QUOTIENT (C=(E= D (SUM/ (K* 1 I)))V*)) 
(C=(V=(C= (REPEAT 1(D(0R V* (E= V* K*))))) ' 
QUOTIENT (E= D (SUM/ (K* 1 I))))H*)) 
K* K* 

(FRT(K* 1)(K* 2)) (V=(K* 2) QUOTIENT (K* 1))' 

(EVL(E* 1)(E* 2) (REPEAT 1((E* 1) (E* 2)))H*) 
(C= H*(B= BAR(V=(C=(E* 1)EQSIGN(E* 2)) 
(REPEAT 1(C=(E* 1)EQSIGN(E* 2)))))) 
C* (TRANSCENDENTAL (OR V* I* Q* E*)) 

(C= TRANSCENDENTAL (OR (BLANK V*) (BLANK I*) 

(BLANK Q*) (LPAR E* RPAR))) 
INTEGER INTEGER 
LITER LITER 
U* U* 
Q* Q* 
H* H* 

(ABS E*) (C= LBAR E* RBAR) 
V* V* 



C* 

C* 

H* 
H* 
H* 
H* 
H* 
H* 
H* 
H* 

H* 

Q* 
H* 



U* 



I* 
I* 
E* 



K* 
V* 
p* 
p* 
p* 
R* 
R* 



Fig. 5. 
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III. Searching the X-Y Plane 

I now want to describe how the parsing algorithm picks up the characters 
from the X-Y plane to form a linear string. The program is presented with 
a rectangle known to enclose all of the characters and a rule for selecting 
one character from the rectangle to be added to the linear string. For 
example, in Fig. 6 the program might be given the solid rectangle and in- 
structions to find the leftmost character in it. The particular character 
found determines an ordered list of new rectangles, defined in terms of the 
dimensions of the original one and the dimensions of the character found. 
A character selecting rule is also associated with each rectangle. In Fig. 
6 the divide bar would yield a list of rectangles 1,2, and 3 and the in- 
structions to find the leftmost character in 1 and 2 and the leftmost 
character falling within the shaded tolerance area in 3. The program then 
calls itself recursively on each of these smaller rectangles. When no 
character is found in a rectangle, control returns to the next higher level. 
I call this a character directed search scheme. 

The manner of defining the subrectangles and their associated selection 
rules depends on the particular set of positional operators. First, any 
given rectangle contains either a single symbol or is composed of smaller 
subrectangles related by one of the positional operators. (Requirement 1) 
Subrectanges must always be scanned in the same order . This requirement 
is not absolutely necessary, it is only necessary that we get the same 
interpretation of the linear string no matter what the order. The linear 
string parsing problem will probably be simpler, however, if requirement 1) 
can be met. Second, our approach will be to assign a choice of subrectangles 
and selection rules to each of the positional operators. When this is done 
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Fig. 6 



A Step in the Character Search 
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it is necessary that (Requirement 2) no grammatical string in positional 

operator notation can result in the picking up of some of the characters 

in a text-book expression defined by a second distinct grammatical string 

in positional operator notation . This requirement is to avoid ambiguous 

spatial parsings. For example, we do not want to allow an expression like 

2 2 

X Y to have the legal parsings X times Y or X times 2 Y. 

Anderson also divides each rectangle into smaller ones, using the 
terminal characters as guides. Looking at the rules in his grammar and 
algorithm we note that he partitions the current rectangle in all appro- 
priate ways. To be applicable, a rule may contain only those terminal 
characters which are in the current rectangle, and the characters which 
occur leftmost and rightmost in the subrectangles must be permitted by the 
rule. These tests certainly cut down the number of partitions tried. 
(Assertion A) However, for the grammar defined in the first section the 
names and dimension s of the characters which can occur leftmost, or right- 
most in a rectangl e completely determine the subrectangles to be tried. 
This is a fortunate situation! Requirement 2) then guarantees that if the 
associated selection rule picks any character in a given subrectangle, that 
subrectangle applies to the text-book expression at hand. 

A little thought should convince the reader that a context free 
grammar will generate a tree structure of subexpressions. For example, 
the tree structure for the positional operator expression, 

(C= (V= (C= 1 = 1) Z oo) (E= xi)) 

which was given in part 1 is : 
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(C= (V= (C= I = 1) Z »)(E= X I)) 
(E= X I) 
I 



A top down parsing scheme, of which Anderson's is an example, picks one 
of the forms which the top node can have. For this choice there are 
generally only certain possible choices for the lower nodes, and these 
can be dependent on the choice for the top node . This can be an advan- 
tage, as Anderson points out, if one of the lower nodes can be discovered 
very easily, thus constraining a higher node and reducing the possibilities 
for other nodes. For example, in J xdx which has the tree structure; 
(1) Jxdx 

d x 
discovery of the integral sign gives us the form of node (1) and then we 
know that node (2) is not a product. 

The task of parsing is sometimes simplified if the construction at 
any node depends only on the nodes below it. This is the situation in a 
precedence grammar, as defined by Floyd. This property can be verified 
by examining the grammar in question. For the grammar defined in part 1 , 
it can be verified that the choice of subrectangles is independent of con - 
text in which the main rectangle appears . Subrectangles and selection rules 
can be associated with certain characters without regard to the context in 
which the characters occur. These are the properties which make a char- 
acter directed search scheme possible. 
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Assertion A) follows from a simple exhaustive argument for our part- 
icular grammar. First, the leftmost character of a rectangle must belong 
to its leftmost subrectangle for each positional operator except for vertical 
concatenation, which has no obvious leftmost subrectangle. Vertical con- 
catenation is used only with the symbols £ and . This is why we 

require that these symbols extend further to the left than rectangles with 
which they are vertically concatenated. We thus know for each positional 
operator which subrectangle is being searched first. Note that when we 
pick up a character we are starting simultaneously on all the rectangles 
in which it lies leftmost. Next, for each positional operator it will be 
possible to tell the positions of additional subrectangles whenever the 
first subrectangle is finished. To see this we list the characters which 
can occur rightmost in the first subrectangle of each positional operator. 
operator Characters 

concatenation C= ( + - LITER INTEGER j E J ) ! D 

vertical concatenation V= E 

horseshoe H= f LITER 

exponent E= LITER ) D | INTEGER 

subscript S= LITER 

bottom B= | 

In the case of B=, S = , H=, and V= the first subrectangle encloses a single 
character, so it is easy to get the dimensions of the other subrectangle 
from the dimensions of this character and those of the main rectangle. 
In the case of E= the rightmost character might be a ) or | which is sym- 
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metrical with respect to the center line of the subrectangle, but may not 
extend to its complete height. Its vertical dimension will suffice if 
we allow the exponent subrectangle of the E= operator to be placed as low as 
the top of the rightmost character in the first subrectangle. Finally, 

the rightmost character of the first C= subrectangle is used to find the 

st 
X- coordinate of the second subrectangle. The X- coordinate of the n+1 — 

j_U. 

subrectangle is found from the rightmost character of the n subrectangle. 
The y- coordinate is found from the center of the leftmost character. 

Notice that if we proceed in this way it is easy to meet requirement 
1). The scheme will work if it satisfies requirement 2). Once again we 
use an exhaustive argument for the grammar at hand. Our method of starting 
the first subrectangle is the same for each positional operator. We must 
show that an additional subrectangle can't be started unless the positional 
operator in question applies. If a false rectangle were started, then the 
first character picked up in it must belong to a second grammatical con- 
struction. That is, there must be some previous character from which 
this character can be reached by two grammatical paths. We have already 
listed the rightmost characters which can initiate a second subrectangle; 
some of them can initiate more than one, either because they apply to an 
operator which initiates more than one or because they apply to more than 

one operator, or both. These are Z, , LITER, |,), D, and INTEGER. 

We must show that the same character can not be reached by taking more 

than one branch from a given character. In the case of 2 and the 

plane is divided into non-intersecting areas above, below, and to the 
right of these characters. For the rest, the second subrectangle is to 
the right. To avoid the situation: 
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b - x 

r — £- dx 

J d ■ 
a 

we must make the objectionable requirement that every or Z ex- 
tend a full character width to the left of all symbols in the rectangles 
concatenated with it vertically. When this is done every rectangle will 
have its leftmost character at least a character width to the left of the 
others, then the operators H=, E=, S = , and C= can take second subrectan- 
gles only when the subrectangle's center lines are in the appropriate 
position. Under these conditions requirement 2) is met. 

It will probably be necessary to relax requirement 2) in order to 
make the algorithm acceptable to users. We will do this after experience 
indicates the complete range of changes needed. 

We have assumed that concatenation does not apply if the space for 
the next symbol is blank. Experimentation with tolerances will be re- 
quired to determine when additional machinery will be needed to make this 
determination. One case which has already been handled is sin x, where 
a blank can be used to separate a transcendental function name from its 
simple argument. Since the n in sin alone is not sufficient to indicate 
that a blank can be added to the character string, this is a case where the 
search scheme can not be character directed. The whole phrase "sin" is 
required to make skipping a blank a legal operation. I call this an ex- 
ample of a phrase directed search scheme. It is especially simple because 
the dimensions of the phrase are not needed to guide the search and be- 
cause the phrase can be found by a lexical parse. 
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A phrase directed search scheme is one where the characters are picked 
up in such an order that the search for the next character depends only on 
the grammatical phrases that can be formed using the characters already 
picked up. It might be necessary to compute the dimensions of the phrases, 
using the formulas which define the positional operators, in order to get 
the parameters needed to guide the search. If we relax this definition to 
allow back-up, then the expression 



; 



b - x 



a d 



dx 



which gave trouble above could be handled with a phrase directed scheme. 
We should note that in order to formulate a grammar for matrices a 
more complicated search will have to be made when a new rectangle is 
entered. 
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IV. Parsing the String 

When the mathematical symbols are picked up by the search algorithm 
it is necessary to add some additional symbols to indicate the positional 
relationships discovered. The exact symbols used should make the result- 
ing string parsable by a simple algorithm if at all possible. The actual 
symbols used are: 

(C= X Y) X Y 

(V= X Y) - X QMARKl Y QMARK2 

(V= X' 2 Y) - 2 X SMARK1 Y SMARK2 ' 

(H= J Y Z) - J Y IMARK Z IMARK 

(H= LITER X Y) LITER SE= X SE=END NE= Y NE=END 

(E= X Y) X NE= Y NE=END 

(S= X Y) - X SE= Y SE=END 

(B= X Y) X BMARK N= Y N=END 

The use of QMARK, SMARK, and IMARK simplifies the parsing by making it 

unnecessary for the left- to-right parser to change state when a , 

Z, or J is parsed. Applying the search scheme to each rule one can find 
the resulting linear string of terminal and non- terminal characters which 
would be produced. A parsing algorithm for this transformed set of rules 

must then be produced. These were found to form an operator precedence 

6 ii dx 
grammar with two exceptions; | X | and r . Absolute value was 

dy L 
handled by converting each | to either f or 1 as it was encountered. 

Absolute value ghus becomes fxl . This can be done based on the pre- 
ceding character, but it represents a change of state since the pre- 
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ceding character may be either T or T . Derivatives were handled by us 
by looking two characters ahead to find them and then going into a special 
state for the duration of the derivative quotient. 

It was then possible to calculate precedence functions for the rules 
not involving derivatives. Since the method of doing this is not usually 
stated, we will give it here. If one has an operator precedence grammar 
then each ordered pair of terminal symbols may have no relations or be in 
exactly one of the relations >, <, and = . Precedence functions f and 
g map the terminal symbols into the integers so that if xRy then f (x) R g (y) 
Thus the precedence can be found by comparing the integers assigned to each 
terminal symbol. To find f and g, first associate with each terminal sym- 
bol x a second symbol x 1 . Define new relations G and E as follows. 
If x > y, then xGy'. If x < y then y'Gx. If x = y, then xEy 1 . 
We have eliminated <, now we eliminate E. Arranging the symbols along 
the edges of a matrix, unprimed first, any two symbols can be in either 
relation E or G. Moving down the rows, if xEy then xRz -* yRz . Use 
this to copy the row for x into the row for y and eliminate the row for x. 
If xEy and yGx occurs, f and g do not exist. When the relation E has 

been eliminated the elements form a lattice under the relation G. One 

14 
must now complete the transitive relation G. If xGx occurs for some 

x, then f and g do not exist. Otherwise, the symbols can easily be 

ordered. Taken in any order the symbols are added one at a time to a list, 

being moved to the right until they reach an element with, which they are in 

relation G. The elements are now assigned an integer corresponding to their 

distance from the right end of the list. The integers assigned to unprimed 
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elements form f and those assigned to primed elements form g. A small 
grammar with its precedence table and precedence functions is shown in 
Fig. 7. 
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E* 
E* 
X* 
T* 
X* 

A* 
A* 



LISP 
(PLUS E* T*) 
X* 

(TIMES T* A*) 
A* 
A* 

NUMBER 
LITER 



Positional Operator 
(C= T* + E*) 
X* 

(C= A* * T*) 
A* 

(C= LPAR A* RPAR) 
NUMBER 
LITER 



f 
2 

5 

11 

10 



4 



g 3 9 8 7 6 

+ * NUM LITER ( ) 



+ 


L 


L 


L 


L 


L 




* 


G 


L 


L 


L 


L 




NUMBER 


G 


G 








G 


LITER 


G 


G 








G 


( 






L 


L 




= 


) 


G 













Fig. 7 

A Small Grammar with its Precedence 
Table and Functions 
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V. Conclusion 

The speed of this program indicates that a more complex one which over- 
comes its weaknesses will run in a practical time. The approach taken here 
should lead to an efficient result and one which can be understood. The 
next theoretical step is to construct a parser which will explore the left 
to right tree, only backing up in truly difficult cases. 
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